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Abstract 

Wc show that testing inclusion between languages represented by 
regular expressions with numerical occurrence indicators (#REs) is 
NP-hard, even if the expressions satisfy the requirement of "unambi- 
guity" , which is required for XML Schema content model expressions. 



1 Proof of the result 

We have seen before [3] that testing for inclusion and overlap of languages 
represented by ^REs is NP-hard. Testing for the overlap was seen hard also 
for expressions that satisfy the XML requirement of "unambiguity" . On 
the other hand, the NP-hardness proof of #RE inclusion used ambiguous 
expressions. Here we show that unambiguity does not make the testing of 
inclusion essentially easier. The proof is based on a polynomial time Turing 
reduction ^ Chap. 5] from PARTITION, which is one of the best-known 
NP-complete problems [21 [1] . 

Theorem 1.1 The #RE inclusion problem is NP-hard, also for unambigu- 
ous #REs. 

Proof. Let a set ^ = {ai, . . . , ak} and a positive integer weight w{a) of each 
a € A form an instance of PARTITION. The problem is to decide whether 
A can be split in two equal-weight subsets A' and A — A', that is, whether 
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^(«) = ^(a) (1) 



holds for some A' C A. Notice that ([I]) can hold only if the total weight of 
the set A is even. Therefore we can assume that Yla€A''^i^) ~ '^^ some 
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positive integer n, which means that ([T]) holds if and only if 

for some A' A. 

For shortness, denote the weight w{ai) of an item Oj € j4 by Wi. 

Now form the following two ^^REs over the alphabet S = {oq, ai, . . . , a^}: 

El = a^+i-"+i(a5"i-"'i|e)(af ••'"2|e)---(a^'=-""=Ie) 
= ((ao|ai|...|afcr+i-2")i-2 

Notice that both expressions are trivially unambiguous since each symbol 
of S appears exactly once in both of them. Expression Ei describes words 
of the form Qq'^^u, where the length of the suffix u equals the total weight 
of some subset of A. Therefore L{Ei) C {v Ti* | n + 1 < \v\ < 3n + 1}. 
Obviously Ei accepts a word of length 2n + 1 if and only if a partition that 
satisfies (j2]) exists. Expression E2, on the other hand, rejects any words of 
length 2n-\- 1: 

2n An 
i=n+l i=2n+2 

= {v £ T.* \ n + 1 < \v\ < 4n, \v\ ^ 2n + 1} 

Now L{Ei) C L{E2) holds iff Ei does not accept any word of length 2n + 1, 
which holds if and only if no partition which satisfies ([T]) exists. □ 
So, a polynomial-time algorithm for testing the inclusion of unambiguous 
#REs would imply P = NP, which is considered most unlikely. 
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